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THE CORRELATION BETWEEN A COMPONENT, 
AND BETWEEN THE SUM OF TWO OR MORE 
COMPONENTS, AND THE SUM OF THE RE- 
MAINING COMPONENTS OF A VARIABLE. 

By J. Arthur Harris, Carnegie Institution of Washington, Station 
for Experimental Evolution, Cold Spring Harbor, N. Y. 



Many quantitatively measured variables with which the 
statistician has to deal in both social and biological sciences 
are really composite in character. Thus the death rates of a 
series of districts are made up of mortalities due to a number 
of causes. The total milk production of a cow is the sum of 
the productions of individual lactation periods which may 
differ greatly in quantity. The total length of an organism 
is the sum of the lengths of its component parts. The annual 
egg production of a fowl in an egg laying competition is the 
sum of the records for individual months. 

Thus Xi, Xi, x z . . . x n are the components of the vari- 
able X, where X=2(x). 

It is customary, and in many instances quite proper, to deal 
with all such constituent elements as quite independent vari- 
ables. When only one correlation involving two individual 
components is to be determined, a table may be formed in the 
usual manner. Cases may, however, arise in statistical analy- 
sis in which it is desirable to determine the correlation be- 
tween the magnitude of any individual constituent element, x, 
of the variable X and the sum of the remaining elements (X—x). 
Indeed, all possible measures of this kind may be needed, i. e., 
rxjx-xj, rx,(x-xj, i~x,(x-x^, • ■ ■ rx n (x-x^- 

If X be large and the values of x 1} x^, x 3 . . . x„ vari- 
able, the arithmetical routine may be troublesome, in fact 
practically prohibitive. 

The correlation between the sum of two components and the 
sum of the remaining components 



43] Correlation between Components. 855 

• r (x t +x > )(X-x i -x > )! • • • r (x„_ 1 +x„)(X-x„_ i-a; n )) 

or between the sum of three or more components and the sum 
of the remaining components, e. g., 

r (x 1 +x 2 +x^(X—x l —x i —x a )> r (x t +x t +x t )(X~x a —x l —x l )> 
r <- x n-2+ x n- l+*»> ( x -Zn-2- x n- l~ x ^> ' 

or any of the other possible permutations which increase 
rapidly with the number of components, may be quite as 
important, at least, as the correlation between any individual 
component and the sum of the remaining components. 

The determination of all such relationships is relatively 
simple providing the correlations between the components 
are known or their product moments have been determined. 

Since these methods may not suggest themselves to those 
not familiar with the many possible modifications of the corre- 
lation formulae it seems worth while to put the equations on 
record. 

The first requisite is the means and standard deviations of 
the sum of two or more components and of the difference be- 
tween the variable and the sum of two or more components. 
The principle is well known.* 

The value of ex is known and constant throughout. The 
variable values of <r Xl , c x% , <r X| . . . <r Xn and r XiX , r x ^, r x&x 
. . . r x x have also been determined. The standard devia- 
tion for the variable less any component, say the pth, is 
<r\x-x p ) = o 2 x+o 2 x p -2r Xp x<rx(rx v 

For the sum of any number of components the standard 
deviation is given by 

<r 2 = ff Xi 2 +<r Xi 2 + . . . +v x *+2r XiX c x c Xt +2r XiX v x v Xi 

+ • • • 2r x ^ff x ff x> + . . . 

In actual practise it is often most convenient to work from 
the original summations.f Thus the moments S(a;i), S(xi 2 ), 

* See for example, E. Pearl, Biometrika.'Vol. VI, pp. 437-438, 1909, and G. U. Yule, Introduction to th« 
Theory of Statistics, p. 208, 1911. 

t Harris, J. Arthur. The arithmetic of the product moment method of calculating the coefficient of 
correlation. American Naturalist, Vol. XLIV, pp. 693-699, 1910. 
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2(3*), 2(atf), Ufa), 2(a;3 2 ) . . . Z(aw), 2(a; K 2 ) and 2(X), 
2(Z 2 ) with the product moments 2(xiX), 2(a%X), 2(a; 3 X) 
. . . 2(a;„X) lead directly to the desired results. 

In the following section I give the equations in terms of the 
original moments or moment coefficients. 

The means and standard deviations for the individual com- 
ponents, say for example the pth component, are 

x p =X(x p )/N,<r\ p =2(x p *)/N-l2(x p )/Nr 

The means of the sums of the remaining components are quite 
obviously 

(X-a; 1 ) = [2(X)-2( a;i )]/iV, etc, 
while the standard deviations are given by 



^Cx-*,) = {2(X*) -22(3*20 +2W) } /N- ( X-xtf , 
<7 2 ( x-^= (2(X 2 )-22( : r 2 X)+2( a;2 2 )}/iV-(Z-a ;2 ) 2 , etc. 
The mean for the sum of two components is 

(Zp+z 9 ) = [2(*p)+2(%>]/^ 
and so on for any number of components. 

For the sum of the remaining (n — 2), (n — 3) components the 
means are 

(X-x P -x g )=[2(X) -2(xJ -H(x,)]/N, 

and similarly for reductions due to the removal of 3, 4 or more 
components. 

For the sum of two components, (x p +x q ) 

o-* = [2(x p i )+2-2(x p x q )+2(x>)]/N-(x p +x q y 
For three components, (x m +x p +x q ) 

<7 2 = [2(z m 2 ) +2(V) +2(x t *)+22(x m x p ) +22(x m x g ) 
+2-Z(x p x i )]/N- (x m +x p +x t ) 2 
For four components, (x h +x m +x p -\-x q ) 
o* = \2(x h *) + . . . +2(* 9 2 ) +22(0^0+ . . . 
+22(x p x q )]/N-(x h + . . . +x a y, 

and so on for higher numbers of components. 

The values of the means of the (n— 2), (« — 3), (n— 4) re- 
maining components have been indicated. 
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The standard deviation of the value which remains after 
deducting the values of two components x p and x q is 

<r 2 = [S(Z 2 )+S(V)+S(V) -22(x p X) -22Q g X) +22(x p x q )]/N 

—(x—x p —x a y 

The value of a {X -x m -x p -x q ) is given by 

<r 2 = Krj+ZM+ZW+^W -2S(a; m X) - 

22(x p X) -22(s 9 X) +2X(x m x p ) +2 S(a; ro a; 9 ) + 

22(a;^ 4 )]/JV- (Z-aj.-a^-ag* 

The process for the value remaining after the renewal of 
four components is 

<r 2 = [2(X 2 )+2(^)-f- . . . +2(x q *)-22(x h X)- 
. . . -22(s g X)+22( %sJ + . . . 

+22(x p x q )]/N-(X-x h - . . . -x a y 

I now turn to the correlations. 

Consider first of all the simplest of the two problems, the 
determination of the correlation between a component and 
the sum of the remaining components of the variable. 

The correlations between the variable and its constituent 
elements r x x, i~xx, ^x, • • • r x n x & re often wanted for 
themselves, and in any instance are relatively easily deter- 
mined. The regressions of X on its constituent elements, say 
on the pth component, is 

ax 

% 
Obviously the regression slope of (X— x p ) on x p is 

ViX-Xv) _ <*x 
r x p (x-x p ) 2- = r XpX — _ i ; 

&X &T 

■Sp X p 

or in terms of correlation 

_ a x _ ff *p 

r x p (X— x p ) ~ r x p X 

a (X-x p ) 1(X-Xp) 

I now turn to the formulae necessary for the determination 
of the correlation between the sum of any two components, 
say Xp and x q , and the sum of the remaining components, 
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(X— x p — x q ), of any three components, say x m , x v and x q , and 
the sum of the remaining components (X— x m — x v — x q ), and be- 
tween the sum of any four components, say x h , x m , x p and x g and 
the sum of the remaining components, (X— x h — x m — x p — x q ). 

The product moments for the sum of two variables (x p -\-x a ) 
and the sum of the remaining components (X— x p — x q ) is 

'2{x p X)+'2{x 9 X)-2V{x p x q ) -S(V) -S(x fl 2 ) 

For the sum of three components, (x m +x p +x q ), and the 
sum of the remaining components, (X—x m —x p —x q ), the 
product moment is 

2(x m X) +2(x p X) +2(x q X) -22(x m x p ) -22(x m x q ) -22(x p x q ) 
-S(o; m 2 )-S(V)-S(a; fl 2 ) 

The product moment for four components (x h +x m +x p +x q ) 
and the sum of the remaining components is of course 

2l(x k +x m +x p +x q )(X-x h -x m -x p -x q )] 

which is easily thrown into the convenient form illustrated 
above for two and three components. 

In certain cases the coefficient of correlation between a 
single component, say x p , and the sum of the components 
remaining after the deduction of the pth and one or more 
other components is desired. Or, more generally, the coef- 
ficient of correlation between the sum of n components and the 
sum of the components remaining after the deduction of 
n+1, n+2, ft+3 . . . components may be required. 

The moments from which the means and standard devia- 
tions of the various components, sums of components and 
differences may be computed, have already been indicated. 
Thus the product moments only are required. 

For the correlation between a single component, x p , and the 
variable less two components, x p and x q , the product moment is 

2(a; p X)-2(V)-2(ay*g 

For the correlation between a coiiiponent, x p , and the vari- 
able less three components, (X— x p — x q — x m ), the product 
moment is 

2 (XpX) - 2 (ay 1 ) - 2 (x p x q ) - 2 (ayO 
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The formula for the product moment underlying the corre- 
lation between a component and the variable less the sum of a 
larger number of components is obvious. 

For the sum of two components, (x p +x q ), and the value of 
the variable less three components, (X—x p —x q —x m ), the 
product moment is 

2(x p X)+2(x g X) -2(x p 2 ) -2(z s 2 ) -22(x p x t ) 
— 2 {x v x m ) — 2 (x q x m ) 

Correlations beyond these for the sum of three components, 
(x p -t-x q +x m ), and the value of the variable less that of four 
components, (X— x p — x q — x m — x h ), will be required very rarely 
indeed. The product moment for such a relationship is 

2 OVT) +2 (X.X) + 2(x m X) -2(V) -2(V) ~2(*m 2 ) -22( Va ) 
- 2 2 (x p x m ) - 2 2 (x a xj - 2 (x P x h ) - 2 (XgX h ) - 2 (x m x h ) 

Thus the formulae for the determination of all of the funda- 
mental constants required in the calculation of the coefficient 
of correlation between the sum of from two to four compon- 
ents and the sum of the remaining components have been 
deduced. 

Some of these results are known, others are such as anyone 
familiar with elementary statistical theory might write for 
himself. My object here has not been to express the rela- 
tionships in the most elegant form but in that most convenient 
for practical work. The computer will note the great advan- 
tages of determining the moments and product moments for 
the individual components once for all. It is often just such 
simplification of method that greatly decreases the labor 
involved in the routine of the computing room. 



